AITopics | open source dataset

Collaborating Authors

open source dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Abstractive Text Summarization for Resumes With Cutting Edge NLP Transformers and LSTM

Mercan, Öykü Berfin, Cavsak, Sena Nur, Deliahmetoglu, Aysu, Tanberk, Senem

arXiv.org Artificial IntelligenceJun-23-2023

Text summarization is a fundamental task in natural language processing that aims to condense large amounts of textual information into concise and coherent summaries. With the exponential growth of content and the need to extract key information efficiently, text summarization has gained significant attention in recent years. In this study, LSTM and pre-trained T5, Pegasus, BART and BART-Large model performances were evaluated on the open source dataset (Xsum, CNN/Daily Mail, Amazon Fine Food Review and News Summary) and the prepared resume dataset. This resume dataset consists of many information such as language, education, experience, personal information, skills, and this data includes 75 resumes. The primary objective of this research was to classify resume text. Various techniques such as LSTM, pre-trained models, and fine-tuned models were assessed using a dataset of resumes. The BART-Large model fine-tuned with the resume dataset gave the best performance.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2306.13315

Country:

Asia > Singapore (0.05)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Information Technology > Security & Privacy (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

22 open source datasets to boost AI modeling

#artificialintelligenceApr-8-2022, 21:38:54 GMT

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Some say, "data is the new oil," with an air of seriousness. And while the phrase may capture a certain truth about the modern digital economy, it fails to model the way that bits can be copied again and again. Sometimes the ease of sharing creates a distinct absence of scarcity and that changes the economics of the entire game. One of the best ways to visualize this is to tap into some open source datasets that are proliferating on the Internet.

dataset, information, open source dataset, (15 more...)

#artificialintelligence

Country:

Europe > United Kingdom (0.14)
North America > United States > New York (0.05)
Pacific Ocean > North Pacific Ocean > Puget Sound (0.04)
(2 more...)

Industry:

Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.50)
Information Technology > Communications > Social Media (0.48)
Information Technology > Communications > Networks (0.34)

Add feedback

Council Post: What Exactly Is Artificial Intelligence? (Hint: It's All About The Datasets)

#artificialintelligenceDec-15-2021, 17:55:36 GMT

Boris Kontsevoi is a technology executive, President and CEO of Intetics Inc., a global software engineering and data processing company. Many of today's emerging technologies and products heavily rely on artificial intelligence (AI) and machine learning (ML). And while there are hundreds of articles written about this topic, very few get into the nitty gritty of what truly powers AI: data. The definition of artificial intelligence varies depending who you ask. A data scientist will have a much different answer than someone who is just peripherally aware of AI.

ai application, dataset, intelligence, (8 more...)

#artificialintelligence

Industry: Information Technology (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Open Source Datasets for Computer Vision - KDnuggets

#artificialintelligenceAug-26-2021, 06:00:38 GMT

Computer Vision (CV) is one of the most exciting subfields within the Artificial Intelligence (AI) and Machine Learning (ML) domain. It is a major component for many modern AI/ML pipelines, and it's transforming almost every industry, enabling organizations to revolutionize the way machines and business systems work. Academically, CV has been a well-established area of computer science for many decades, and over the years, a lot of research has gone into this field to make it better. However, the use of deep neural networks has recently revolutionized the field and given it new fuel for accelerated growth. In this article, we discuss some of the most popular and effective datasets used in the domain of Deep Learning (DL) to train state-of-the-art ML systems for CV tasks.

computer vision, cv task, dataset, (14 more...)

#artificialintelligence

Industry: Information Technology (0.30)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

An A.I. Training Tool Has Been Passing Its Bias to Algorithms for Almost Two Decades

#artificialintelligenceJun-27-2021, 18:10:57 GMT

Night after night, Fien de Meulder sat in front of her Linux computer flagging names of people, places, and organizations in sentences pulled from Reuters newswire articles. De Meulder and her colleague, Erik Tjong Kim Sang, worked in language technology at the University of Antwerp. It was 2003, and a 60-hour workweek was typical in academic circles. She chugged Coke to stay awake. The goal: develop an open source dataset to help machine learning (ML) models learn to identify and categorize entities in text.

conll-2003, dataset, scale ai, (10 more...)

#artificialintelligence

Country:

Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.25)
North America > United States > Massachusetts (0.05)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Council Post: What Exactly Is Artificial Intelligence? (Hint: It's All About The Datasets)

#artificialintelligenceMay-4-2021, 19:10:21 GMT

ai application, dataset, intelligence, (8 more...)

#artificialintelligence

Industry: Information Technology (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

10 Best Entry Level Machine Learning Tutorials

#artificialintelligenceOct-21-2020, 13:35:33 GMT

The field of machine learning is becoming easier and easier to enter thanks to readily available tools, a wide range of open source datasets, and a community open to sharing ideas and giving advice. Almost everything you need to get started is online; it's just a matter of finding it. To help entry-level enthusiasts get their head around different ML systems and how to implement them, I've put together some of my favorite machine learning tutorials. All of the following articles provide a brief introduction to the systems being covered, talk you through the cleaning, testing, and implementation process, and also provide links to datasets and Gitub repositories so you can follow the same steps on your own. This detailed guide explores transformer architecture by creating a translator that takes an English sentence and translates it to German. It covers data preprocessing, model training, and wraps things up by looking at the results and what could be done to improve the system.

artificial intelligence, deep learning, machine learning, (12 more...)

#artificialintelligence

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)

Genre: Instructional Material (0.49)

Industry:

Media (0.31)
Information Technology (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti

Gordienko, Nikita, Gang, Peng, Gordienko, Yuri, Zeng, Wei, Alienin, Oleg, Rokovyi, Oleksandr, Stirenko, Sergii

arXiv.org Machine LearningAug-31-2018

Machine learning techniques are presented for automatic recognition of the historical letters (XI-XVIII centuries) carved on the stoned walls of St.Sophia cathedral in Kyiv (Ukraine). A new image dataset of these carved Glagolitic and Cyrillic letters (CGCL) was assembled and pre-processed for recognition and prediction by machine learning methods. The dataset consists of more than 4000 images for 34 types of letters. The explanatory data analysis of CGCL and notMNIST datasets shown that the carved letters can hardly be differentiated by dimensionality reduction methods, for example, by t-distributed stochastic neighbor embedding (tSNE) due to the worse letter representation by stone carving in comparison to hand writing. The multinomial logistic regression (MLR) and a 2D convolutional neural network (CNN) models were applied. The MLR model demonstrated the area under curve (AUC) values for receiver operating characteristic (ROC) are not lower than 0.92 and 0.60 for notMNIST and CGCL, respectively. The CNN model gave AUC values close to 0.99 for both notMNIST and CGCL (despite the much smaller size and quality of CGCL in comparison to notMNIST) under condition of the high lossy data augmentation. CGCL dataset was published to be available for the data science community as an open source resource.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Machine Learning

1808.10862

Country:

Asia (0.94)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.41)

Genre: Research Report (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback